India is a multilingual multi-script country. In every state of India thereare two languages one is state local language and the other is English. Forexample in Andhra Pradesh, a state in India, the document may contain textwords in English and Telugu script. For Optical Character Recognition (OCR) ofsuch a bilingual document, it is necessary to identify the script beforefeeding the text words to the OCRs of individual scripts. In this paper, we areintroducing a simple and efficient technique of script identification forKannada, English and Hindi text words of a printed document. The proposedapproach is based on the horizontal and vertical projection profile for thediscrimination of the three scripts. The feature extraction is done based onthe horizontal projection profile of each text words. We analysed 700 differentwords of Kannada, English and Hindi in order to extract the discriminationfeatures and for the development of knowledge base. We use the horizontalprojection profile of each text word and based on the horizontal projectionprofile we extract the appropriate features. The proposed system is tested on100 different document images containing more than 1000 text words of eachscript and a classification rate of 98.25%, 99.25% and 98.87% is achieved forKannada, English and Hindi respectively.
展开▼